Regular Language Constrained Sequence Alignment Revisited
نویسندگان
چکیده
Imposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, the Regular Expression Constrained Sequence Alignment Problem was introduced, which proposed an O(n²t⁴) time and O(n²t²) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the input non-deterministic automaton. A faster O(n²t³) time algorithm for the same problem was subsequently proposed. In this article, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n²t³)/log t). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve the worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense.
منابع مشابه
Regular Expression Constrained Sequence Alignment
Given strings S1, S2, and a regular expression R, we introduce regular expression constrained sequence alignment as the problem of finding the maximum alignment score between S1 and S2 over all alignments such that in these alignments there exists a segment where some substring s1 of S1 is aligned with some substring s2 of S2, and both s1 and s2 match R, i.e. s1, s2 ∈ L(R) where L(R) is the reg...
متن کاملSA-REPC - Sequence Alignment with Regular Expression Path Constraint
In this paper, we define a novel variation on the constrained sequence alignment problem, the Sequence Alignment with Regular Expression Path Constraint problem, in which the constraint is given in the form of a regular expression. Our definition extends and generalizes the existing definitions of alignment-path constrained sequence alignments to the expressive power of regular expressions. We ...
متن کاملEfficient Algorithms for Regular Expression Constrained Sequence Alignment
Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, Arslan introduced the regular expression cons...
متن کاملThe Evaluation of a Stochastic Regular Motif Language for Protein Sequences
A probabilistic regular motif language for protein sequences is evaluated. SRE-DNA is a stochastic regular expression language that combines characteristics of regular expressions and stochastic representations such as Hidden Markov Models. To evaluate its expressive merits, genetic programming is used to evolve SRE-DNA motifs for aligned sets of protein sequences. Different constrained grammat...
متن کاملMultiple Sequence Alignments with Regular Expression Constraints on a Cloud Service System
Multiple sequence alignments with constraints are of priority concern in computational biology. Constrained sequence alignment incorporates the domain knowledge of biologists into sequence alignments such that the user-specified residues/segments are aligned together according to the alignment results. A series of constrained multiple sequence alignment tools have been developed in relevant lit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 18 5 شماره
صفحات -
تاریخ انتشار 2010